13:00
2026-06-18
dev.to
large-language-models
Ninety-one percent accurate is not what it sounds like
An analysis by Oumi of Google's AI Overviews found that while accuracy improved from 85% on Gemini 2 to 91% on Gemini 3 on the SimpleQA benchmark, the rate of ungrounded claims among correct answers iโฆ